A Generalized Serverless Event-Driven Architecture for Automated Receipt Processing and Data Extraction

Authors: Shravan Digambar Komejwar, Tushar Ashok Khedkar, Ganesh Vijay Kadam, Prajyot Randhir Barsing, Nitin Talhar

DOI Link: https://doi.org/10.22214/ijraset.2026.78878

Abstract

Receipt volumes don\'t follow a schedule. Quiet for two days, then thirty submissions at once because someone\'s expense report is due. A server sitting on 24/7 for that is mostly wasted cost. The approach in this paper is based on serverless computing principles — resources remain inactive between executions. Upon data arrival, processing is triggered automatically, structured data is stored in a database system, users are notified. This represents the overall processing flow. The architecture consists of multiple functional layers. IPO notation tracks what enters and exits each stage — keeps the flow auditable without overcomplicating it. Paying per invocation beat committing to reserved instances; idle capacity between bursts doesn\'t bill, which is the whole point. Subtotal aggregation, tax application, and final transaction computation are all spelled out in the methodology. Full runtime benchmarking isn\'t something this paper gets into — that needs its own controlled test environment. This work focuses on the architectural design, while detailed implementation and performance evaluation are considered as part of future work.

Introduction

Receipt processing at scale faces challenges due to inconsistent document formats, uneven submission volumes, and manual-entry bottlenecks. Traditional server-based systems struggle with cost, idle capacity, and synchronous processing, while relational schemas cannot easily handle receipts with varying structures.

The proposed solution uses a serverless, event-driven architecture with asynchronous OCR and field extraction, allowing immediate upload confirmation while processing occurs in the background. NoSQL databases store schema-less data, accommodating receipts of any format without preprocessing. Modular layers isolate failures, simplify maintenance, and enable independent scaling.

Key advantages include automatic scaling, pay-per-use billing, fault isolation, and decoupled frontend-backend processing, improving responsiveness, reliability, and efficiency. Pre-signed URLs let clients upload directly to cloud storage, triggering downstream processing automatically, and analytics compute subtotals, taxes, and totals once structured data is extracted.

In short, the system provides flexible, scalable, and cost-efficient receipt processing with minimal user delay and robust error handling.

Conclusion

Traditional receipt pipelines have two expensive problems: servers that run idle and OCR that holds up user responses. This approach addresses both challenges. Pay-per-use compute means you\'re not paying for downtime. Async OCR means the upload confirms before processing even starts. Getting those two things to work together reliably is where the actual implementation effort goes. The proposed design can be generalized to other document processing applications. Irregular document volumes with variable formats come up all over the place — expense management, invoice processing, document archiving. Stateless functions, schema-free storage, managed ML — that combination generalises to most of those problems without much reworking. Further validation can be conducted through real-world evaluation .

References

[1] An innovation in paper receipts: the electronic receipt management system Katherine T. Wadsworth, Michael T. Guido, John F. Griffin, and Arcan Mandil, University of Virginia, Charlottesville, VA, USA, April 23, 2010. [2] Passive Measurement Study of AWS EC2, S3, and CloudFront, Cloud Network Measurement Dataset, Year Not Specified. [3] S. Surkov, “Model and Method of Chunk Processing of Payload for HTTP Authorization Protocols,” International Cybersecurity Research Proceedings, 2020. [4] Tally Solutions Pvt. Ltd., “Privacy of Business Data — Tally Case Study,” Tally Corporate Case Study Series, 2023. [Online]. Available: https://tallysolutions.com/tally/privacy-of-businessdata-tally-case-study/ [5] W. Y. Mok, “A Feasible Schema Design Strategy for Amazon DynamoDB: A Nested Normal Form Approach,” in Proc. IEEE Int. Conf. Industrial Engineering and Engineering Management (IEEM), 2020. [6] I. Saeed, S. Baras, and H. Hajjdiab, “Security and Privacy of AWS S3 and Azure Blob Storage Services,” in Proc. IEEE Int. Conf. Computer and Communication Systems (ICCCS), 2019. [7] Amazon Web Services (AWS), Choosing an AWS Database Service — AWS Decision Guide, AWS Whitepaper Series, 2024. [8] R. R. Khande, S. Rajapurkar, P. Barde, H. Balsara, and A. Datkhile, “Data Security in AWS S3 Cloud Storage,” in Proc. 14th IEEE ICCCNT, 2023. [9] Amazon Web Services (AWS), AWS Fargate or AWS Lambda? — AWS Compute Decision Guide, 2024. [10] D. Sinha, K. Cottur, K. Bhat, C. Guruprasad, and B. Nath, “Automated Billing System Using RFID and Cloud,” in Proc. IEEE Innovations in Power and Advanced Computing Technologies (i-PACT), 2019. [11] T. T. H. Nguyen, A. Jatowt, M. Coustaty, and A. Doucet, “Survey of Post-OCR Processing Approaches,” ACM/IEEE Joint OCR Research Survey, 2021.

Copyright

Copyright © 2026 Shravan Digambar Komejwar, Tushar Ashok Khedkar, Ganesh Vijay Kadam, Prajyot Randhir Barsing, Nitin Talhar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78878

Publish Date : 2026-03-27

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here